Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian

نویسنده

  • Helen Nigol
چکیده

An experiment with an Estonian Constraint Grammar based syntactic analyzer is conducted, analyzing transcribed speech. In this paper the problems encountered during parsing disfluencies are analyzed. In addition, the amount by which the manual normalization of disfluencies improved the results of recall and precision was compared to non-normalized utterances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disfluency Detection and Parsing of Transcribed Speech of Estonian

The paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.7% and precision...

متن کامل

Shallow Parsing of Spoken Estonian Using Constraint Grammar

In this paper we describe how we have adapted the syntactic analyzer of written Estonian to the spoken language. The Constraint Grammar shallow syntactic parser (Müürisep et al. 2003) was used for the automatic syntactic analysis of the corpus of Estonian spoken language (Hennoste et al. 2000). To adapt the parser, the clause boundary detection rules as well as some syntactic constraints had to...

متن کامل

Automatically enriching spoken corpora with syntactic information for linguistic studies

Syntactic parsing of speech transcriptions faces the problem of the presence of disfluencies that break the syntactic structure of the utterances. We propose in this paper two solutions to this problem. The first one relies on a disfluencies predictor that detects disfluencies and removes them prior to parsing. The second one integrates the disfluencies in the syntactic structure of the utteran...

متن کامل

What lies beneath: Semantic and syntactic analysis of manually reconstructed spontaneous speech

Spontaneously produced speech text often includes disfluencies which make it difficult to analyze underlying structure. Successful reconstruction of this text would transform these errorful utterances into fluent strings and offer an alternate mechanism for analysis. Our investigation of naturally-occurring spontaneous speaker errors aligned to corrected text with manual semanticosyntactic anal...

متن کامل

HCP with PSMA: A Robust Spoken Language Parser

" Spoken language " is a field of natural language processing, which deals with transcribed speech utterances. The processing of spoken language is much more complex and complicated than processing standard, grammatically correct natural language , and requires special treatment of typical speech phenomena called " disfluencies " , like corrections , interjections and repetitions of words or ph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007